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Abstract — The capacity achieving probability mass function 
(PMF) of a finite signal constellation with an average power 
constraint is in most cases non-uniform. A common approach 
to generate non-uniform input PMFs is Huffman shaping, which 
consists of first approximating the capacity achieving PMF by 
a sampled Gaussian density and then to calculate the Huffman 
code of the sampled Gaussian density. The Huffman code is then 
used as a prefix-free modulation code. This approach showed 
good results in practice, can however lead to a significant gap 
to capacity. In this work, a method is proposed that efficiently 
constructs optimal prefix-free modulation codes for any finite 
signal constellation with average power constraint in additive 
noise. The proposed codes operate as close to capacity as desired. 
The major part of this work elaborates an analytical proof of this 
property. The proposed method is applied to 64-QAM in AWGN 
and numeric results are given, which show that, opposed to 
Huffman shaping, by using the proposed method, it is possible to 
operate very close to capacity over the whole range of parameters. 

I. Introduction 

Reliable communication over a noisy channel at maximum 
rate is only possible if the input is distributed according to a 
capacity achieving distribution, i.e., a distribution for which 
the mutual information between channel input and channel 
output is maximum. 

In digital communication systems, the input is not continu- 
ous but has to be chosen from a discrete and finite constellation 
of signal points. In addition, a modulator has to generate the 
probability mass function (PMF) of the signal points from 
equiprobable binary input data. The idea to do so by prefix- 
free modulation codes originates in (T| IV.A]. Based on this 
idea, Huffman Shaping was developed in Q-Q. Huffman 
Shaping consists of two steps. First, the PMF of the signal 
points that minimizes the average energy subject to fixed 
entropy is chosen. The solution of this optimization problem is 
a sampled Gaussian density [2|, (5, Sec. 4.1.2], An equivalent 
formulation of this approach is to look for the signal point 
PMF that maximizes entropy subject to an average power 
constraint. Then, in a second step, the Huffman code of the 
obtained sampled Gaussian density is used as a prefix-free 
modulation code. However, Huffman Shaping is sub-optimal 
and can lead to non-trivial gaps to capacity [6 Sec. VIII.A], (5] 
Sec. 4.2.6]. The reason is that maximizing input entropy is in 
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general not equivalent to maximizing the mutual information 
between input and output. Furthermore, the distance measure 
minimized by Huffman coding is not appropriate (7). 

In this work, we propose a method to derive optimal prefix- 
free modulation codes for fixed signal constellations with an 
average power constraint and additive noise. We first show that 
for every fixed signal constellation, average power constraint, 
and additive noise density, the capacity achieving PMF is 
given by the solution of a convex optimization problem that 
can efficiently be solved numerically. We then use Geometric 
Huffman Coding [7| to find prefix-free modulation codes 
that approximate capacity achieving PMFs. We finally prove 
that our method approximates any capacity achieving PMF 
arbitrarily well both with respect to (w.r.t.) the resulting 
average power and mutual information. As an illustration of 
our results, we apply our method to 64-QAM in AWGN and 
observe that capacity can be approximated extremely well with 
prefix-free modulation over the whole range of the average 
power constraint. Our method differs from Huffman Shaping 
in two ways: first, we approximate the capacity achieving 
PMFs, which are very different from the sampled Gaussian 
density over a large range of the average power constraint. 
Second, the capacity achieving prefix-free modulation codes 
are not Huffman. 

The remainder of this work is organized as follows. In 
Section [II] we state the problem of finding good prefix-free 
modulation codes. Capacity achieving PMFs are characterized 
in Section [III] We then derive in Section [IV] the offset in 
mutual information that results from using a 'wrong' PMF. In 
Section[Vj we show how to find optimal prefix-free modulation 
codes. Finally, we present numerical results for 64-QAM in 
Section ED 

II. Problem Statement 

Consider the discrete-time memoryless channel with addi- 
tive noise given by Y = X + Z. The input X takes values in 
the finite signal constellation set X CC of cardinality \X\=m. 
Input X is subject to an average power constraint E in terms 
of energy per channel use. The additive noise term Z takes 
values in C and is distributed according to a density h. For 
each i = 1, . . . , m, conditioned on x\ G X, the channel output 
Y is distributed according to hi(y) — h(y — Xi). 



Let the input X be independent identically distributed (IID) 
according to a PMF p = (pi, . . . ,p m ) T . We denote the 



energy of the ith signal point by Wi 



I 2 and define the 



energy vector as w = (wi,...,w m ) . The average power 
constraint E can now be written as w T p < E. A capacity 
achieving PMF maximizes the mutual information between X 
and Y. According to p] Eq. (2.4.40)], the mutual information 
is given by 

l(p)=U(Y)-H(Y\X) (1) 



E 



Pi 



K{y) ^og[hi{y)]Ay 



(2) 



where W(-) denotes the entropy function. 

In prefix-free modulation, the binary data stream 
(£?fc,fceN) is parsed into words from a full prefix-free 
code, and each word is mapped to a signal point from X 
by a one-to-one mapping, see, e.g., (9}. The data bits 
are equiprobable and jointly independent. As a consequence, 
the parsed words are independent and identically distributed 
according to pi = 2~ e \ i.e., the probability to parse word i 
is 2~ £ \ where ii is the number of bits in the ith word. We 
define the set D of dyadic PMFs as 



= {p | p is a PMF, 

for i = 1, . . . , m : pi = T 



,4eN}. (3) 

By the Kraft inequality [10, Theorem 5.2.1], any p G D can 
be generated by parsing by a prefix-free code with the 
corresponding codeword lengths. Searching for good prefix- 
free modulation codes is thus equivalent to searching for good 
dyadic input PMFs. The rest of this work is therefore about 
dyadic input PMFs, but keep in mind that for each dyadic input 
PMF, appropriate prefix-free modulation codes are at hand. 

Restricting the input PMF p to the set of dyadic PMFs D, 
the discrete optimization problem becomes 

maximize 2^(p) 

subject to pi = 2~ li ,£i £ N, i = 1, . . . , m 

1, 



1 T P 



w T p < E 



(4) 



where 1 is a m x 1 vector with all components equal to 
one. This is a non-linear optimization problem with integer 
constraint and no efficient algorithm to solve it is known. Our 
approach is therefore to first drop the integer constraint and 
to calculate capacity achieving (in general non-dyadic) PMFs 
and then to approximate these capacity achieving PMFs by 
dyadic PMFs. We will do so in the following sections. 

III. Capacity Achieving PMFs 

Mutual information is concave in p, which can be seen as 
follows. The first term in |2} is an integral over a positively 
weighted sum of functions concave in pi, and is thereby 
concave fTT| Ch. 3.2]. The second term is a linear function 
in p. Therefore, the mutual information is concave in p. 



C'(E*)(E-E*) D( ** } 




average power E 

Fig. 1. The shaded area below the capacity curve C(E) is the region of valid 
operating points {(w T p,I(p)) | p is an input PMF}. Consider a capacity 
achieving PMF p* and some PMF p. If pi = whenever p* = 0, then 
the corresponding operating points (E*,I*) and (E,I) relate as I = I* + 
C'(E*)(E — E*) — D(q\\q*) where q and q* are the output densities that 
correspond to p and p*, respectively. 

For convenience, we replace the maximization in |2]) by 
minimization. This leads to the optimization problem 



minimize 



-Hp) 



subject to 1 p = 1. p > 
w T p < E. 



(5) 



Since the objective function and the inequality constraints 
are convex and the equality constraint is affine, the above 
optimization problem is convex. Therefore, an optimal solu- 
tion can be calculated efficiently by numerical optimization 
methods [11]. We now explicitly evaluate the Karush-Kuhn- 
Tucker (KKT) conditions for ((5). We refer to these conditions 
later in this work. The Lagrangian of the optimization problem 
(|5]l is given by 



L(p,n, v, A) = -l(p) - n T p + v{w T p - E) 
+ A(1 T P -1) 



(6) 



with dual variables /x € R m and v, A € K. Assuming primal 
feasibility, for each 1 < i < m, the KKT conditions are 



9L(p,/i, v, A) dl(p) 



dpi dp t 
V-iPi = 0, 
Hi > 0, v > 0. 



fii + vwi + A = (7) 
z/(w T p -E) = (8) 



(9) 



Denote by p*,/x*,A*,^* a tuple that fulfills the KKT con- 
ditions. From dual feasibility |9| it follows that /i* > 0. 
For every p* > 0, by complementary slackness (|SJ, we have 
/i* = 0. Using these observations in |7]i and rearranging the 
terms, we get 



dl(p*) 
dp* 



< A* + v*Wi, with equality if p* > 0. (10) 



IV. Using the 'Wrong' PMF 

Suppose our average power constraint is E* . Solving |5| for 
E — E* yields the corresponding capacity achieving PMF p* 
and the mutual information I* = I(p*). Our target operating 
point is thus (E*,I*). If we use some other PMF p as an 
approximation of p*, the effective operating point is {E,I), 
where E = w T p and / = X(p). We denote by C(E) the 
capacity curve, i.e., the maximum mutual information that 
is achievable under the average power constraint E = E. 
Formally, 

C(E) = l(p) : p is a solution of ((5) for E = E. (11) 

Note that X(p*) = C(E*) but in general X(p) ^ C(E). The 
following proposition shows how the effective operating point 
(E,I) relates to the target operating point (E*,I*) in terms 
of p and p*. A visualization of the proposition is given in 
Fig-OD 

Proposition 1. Consider a capacity achieving PMF p* and 
some input PMF p. If 



Pi = whenever p* = 



(12) 



then the corresponding operating points (E, I) and (E* , I* ) 
relate as follows: 



T * 

w p 



E = w T p, E* 
1 = 1* + C'{E*){E - E*) - D{q\\q* 



(13) 
(14) 



where q and q* are the output densities that correspond to p 
and p*, respectively and where C'(E) is the derivative ofC(E) 
w.r.t. E. D(-||-) denotes the Kullback-Leibler (KL) distance as 
defined in ^10\ Sec. 8.5]. 



Proof: The output density q that results from using the 
input PMF p is given by 



(15) 



Denote by q and q* the output densities that result from using 
the input PMF p and p*, respectively. We now have 



hi(y) 



I =X(p)=y2,pi [ h^y) \og^ 

i Jc i{y) 

= J2p ' 

i 



dy 



h^og^^dy 
q{y)q (y) 



(16) 
(17) 



q*(y) 

+ E& [ hWlogK^dy. (18) 
For the second summand in ( |18) , we further get 

/ h i (y)log q ^ldy= [ fcpMv)) log^dy 

, Jc i{y) Jc K , > ivy) 



= I q(y)log^ldy=-D(q\\q*). 



(19) 



By simple calculus, we get in accordance with [8} Eq. (4.5.5)] 
for the partial derivatives of I{p) with respect to pi 

^= [hMloE^dy-loge (20) 
oPi Jc QW 

Using ( p~0] > and ( [20] ), we get for the integral term in the first 
summand in ( fT8j ) 



h t (y) dl(p*) 
hi{y) log — r T dy = - + log e 

c Q (y) dpi 



(21) 

< A* + v*Wi + loge (22) 



with equality if p* > 0. Expectation w.r.t. p gives 

Eft / hi(v) lo S -^T\ d V = Eft( A * + v*Wi + loge) 

V Jc <f{y) V 



= A* + V* E Pi w i + l°g e 
i 

= A* + v* Efe - p* + p*)wi + log e 

i 

= A* + v* ^2p*w t + loge + v* Efe -p1)i 

i i 

= Y,P* ( A * + v * w * + lo S e ) + v*{E-E*) 



, Jc 



log 



My) 
q*(y) 



dy + v*{E-E* 



v*(E-E* 



(23) 
(24) 

(25) 

(26) 

(27) 

(28) 
(29) 



where we have equality in ( |23] l since, according to the as- 
sumption of the proposition, pi = whenever p* = 0, and for 
all i with p* > 0, we have because of ( |T0j> equality in ([22 
In ([27), we used E = w T p and -~^?r~* 
Using (f29| and ([19]) in ( [18) , we get 



I = I*+v*(E-E*)-D(q\\q*) 
By (11] Ch. 5.6.3] it holds that 

* _ dC(E) 



w p*, respectively. 

(30) 



dE 



= C'(E* 



(31) 



E=E* 



Using pT) in ( |30] l gives the statement of the proposition. ■ 
Proposition [T] is formulated for the case where consecutive 
signal points are generated independently according to the 
same PMF p. We now consider the case where n consecutive 
signal points are generated according to a joint PMF p("l 
The resulting average power and mutual information per block 
become respectively 

£(«) =£ ( p (")) ; />) =X(pW) (32) 
where £(p( n >) is defined as 

2 



^(P (n) ) = E P (n) (x)||x|| 



(33) 



where X n denotes the Cartesian product of n copies of X 
and where || • || denotes the Euclidean norm. Since the channel 
is memoryless, a capacity achieving joint PMF pW" is the 



product of n copies of some p*. As a consequence, JW* 
nl* and E( n > = The capacity curve is C^ n \E {n >) 
nC(E( n >* /n). Consequently, we get for the derivative 



<9£(") (£(«)*) _ dnC( 



dE( n > 



dE< r 



= n-C 



j!j(n)* 



= C\E*). 



(34) 
(35) 



For blocks of n symbols, Proposition [T] now becomes 

= f(n)* +C'{E*){E^ -nE*) -D(gW||g< n >*) (36) 

where g( n ) and gW* are the output densities that result from 
using the input PMFs p(") and p( n )*, respectively. Dividing 
by n we get for the mutual information per channel use I n = 
/w/n and the energy per channel use E n = £w /n 



D(g(™)| 



I n = I*+C'{E*){E n -E*)- 



V. Optimal Dyadic PMFs 



(n) 



(37) 



For a given average power constraint E* , we want to find 
an operating point (E, I) that is close to the target operating 
point (E*,I* = C(E*)) both in terms of average power and 
mutual information. The following proposition gives sufficient 
conditions to accomplish this. 

Proposition 2. Consider a capacity achieving PMF p* and 
a sequence of PMFs (p„ , n € N) where each PMF in the 
sequence fulfills condition \12) . Assume further that C(E) is 
strictly concave in E. Then 



(E, 



(E* ,1*) 



if one of the following two properties holds: 

Property 1: D(g n ||£) "-±3° 



Property 2: 



D(Pn||P* 



(38) 

(39) 
(40) 



Proof: The assumption that C(E) is strictly concave in 
E implies the sufficiency of Property 1. This can best be seen 
by considering the visualization of Proposition [T] in Fig.[T] As 
D(g||g*) becomes smaller, (E,I) is approaching the tangent 
in (E*,I*) of the boundary. However, because the tangent 
is linear and the boundary is strictly concave, as D(g||g*) is 
getting smaller, (E, I) has to walk in the direction of (E*,I*). 
Otherwise, (E, I) would go above the boundary, which is 
impossible since by the definition of the boundary, I < C(E). 

Sufficiency of Property 2 holds because the KL-distance 
between the output densities is upper bounded by the KL- 
distance between the input PMFs, i.e, 



D(g||g*) < D(p||p*) 



(41) 



This can easily be shown along the lines of p0| Sec. 4.4]. 
Thus, Property 2 implies Property 1, and the sufficiency of 
Property 1 was shown in the first part of this proof. ■ 
We now come to the central point of this work, namely 
to approximate a target operating point (E*,I*) by a dyadic 
PMF p e D. By Proposition [T[ we know that minimizing 



the KL-distance D(g||g*) between the corresponding output 
densities maximizes mutual information in the sense that 
(E,I) approaches the boundary C(E), and furthermore, by 
Proposition [2] we know that if D(g||g*) approaches zero, then 
(E,I) converges to (E*,I*) both in terms of average power 
and mutual information. By Property 2 in Proposition [2] we 
know that both effects can also be achieved by minimizing 
D(p||p*). No algorithm is known that finds the dyadic PMF 
that minimizes D(g||g*) with complexity polynomial in m. We 
therefore minimize the KL-distance between the input PMFs, 
i.e., the aim is to solve 



argminD(p||p* 
pen 



(42) 



As shown in |7|, p can efficiently be found by Geometric 
Huffman Coding (Ghc). For the definition of Ghc and an 
implementation see (7), |12|. The complexity of Ghc is 
mlogm. In the following, p = GHC(p*), i.e., p denotes the 
optimal dyadic approximation of the capacity achieving PMF 
p*. Because of the discrete nature of D, D(p||p*) > in most 
cases, and as a consequence, there is a non-zero gap between 
the reached operating point (E, I) and the target operating 
point (E* ,1*). This gap can be made arbitrarily small by using 
Ghc to approximate the capacity achieving joint PMF p(™)* 
of n consecutive signal points. Since p( n ) = GHC(p(™)*), by 
JT] Proposition 2], we have 



D(p 



fi( n )|ln(«H 







(43) 



Plugging this into ( |37| i, we get by the same concavity argument 
as in the proof of Proposition [2] the following. 

Proposition 3. Consider a capacity achieving PMF p* 
the corresponding capacity achieving joint PMFs p(™)* 
sume C(E) is strictly concave in E. For 

Ghc( p (" ) *) 
-£(p( n )) I(p("))^ 



p («) 



and (E n ,I„) =(- 



we have 



and 
. As- 

(44) 
(45) 

(46) 



VI. Numerical Results: 64-QAM 

For illustrative purpose, we apply our algorithm to 64- 
QAM. The additive noise is zero-mean circular symmetric 
white Gaussian of unit variance. The scaling of 64-QAM is 
specified through the highest signal point energy max|x| 2 . 

For max \ x\ 2 = 20, and for the average power constraints of 
E = 2.5, 5, 10, 20, the capacity achieving PMFs are displayed 
in Fig. [2] The PMFs are obtained by solving ((5J. For E = 
2.5, 5, the PMFs resemble the sampled Gaussian density, but 
for E = 10, the signal point probabilities follow no longer a 
monotonic function of the signal point energy. For E — 20, the 
average power constraint is no longer active and the resulting 
average power of the capacity achieving PMF is E = 11.91. 

We now calculate the dyadic approximations of the capacity 
achieving operating points for E = 2.5, 2.6, 2.7, . . . , 12. For 



E = 2.5 



E = 5 



E = 10 



E = 20 




Fig. 2. Capacity achieving PMFs for the 8x8 signal points of 64-QAM for the average power constraints E 
signal points are given by the heights of the vertical bars. 
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Fig. 3. 



E > 11.91, the average power constraint is no longer active. In 
Fig. [3] the dyadic operating points are displayed. The dyadic 
operating points are very close to the capacity curve C(E). 
This illustrates that minimizing D(p||p*) gives good results 
in practice. However, the placement of dyadic operating points 
is irregular. So, for some optimal operating points, there is no 
close dyadic operating point. This problem is discussed next. 

We now illustrate how a specific (capacity achieving) tar- 
get operating point can be approximated closely by block 
modulation. The results are displayed in Fig. |4] We choose 
as a target operating point (E*,I*) — (5.20,1.81). Denote 
the corresponding capacity achieving PMF by p*. Using the 
dyadic PMF p = GHC(p*), the resulting dyadic operating 
point is (Ei,Ii) = (5.82,1.90), which corresponds to ap- 
proximation errors of 10.65% and 4.74% w.r.t. average power 
and mutual information, respectively. Exceeding the power 
constraint by more than 10% may be critical. Jointly modulat- 
ing two consecutive signal points, i.e., using the joint dyadic 
PMF p( 2 ' = GHC(p( 2 )*) results in the dyadic operating 
point (i?2,^2) = (5.28,1.82), which corresponds to a power 
exceed of 1.52% and 0.55% of mutual information. This is a 
significant improvement and in accordance with Proposition [3] 

For max|x| 2 = 10, we display in Fig. [5] the capacity 
curve C(E) and the mutual information curve Isq(E) that 
results from using sampled Gaussian densities for the input 
PMFs. For small E, both curves lie close together. However, 
as E increases, there is an increasing gap. The capacity curve 
C(E) reaches its maximum / = 1.83 for E — 6.98. For 
E > 6.98, the average power constraint is no longer active. 
Thus, C(oo) = 1.83 is the capacity of the signal constellation 
without average power constraint. The curve Isg(E) reaches 
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Fig. 4. 
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Fig. 5. 



its maximum for E = 6.50. The corresponding mutual 
information differs from the capacity C(oo) by —4.55%. While 
this gap is rather small, it implies a bottleneck when the aim is 
to communicate at rates very close to capacity. Also, this gap 
may be much larger for other signal constellations and/or other 
noise densities. The dyadic approximation of C(6.98) is within 
—0.39% of capacity C(oo), while the dyadic approximation of 
J SG (6.50) is within -4.52% of capacity C(oo). 
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